\section{Our technique}
\label{sec-our-technique}

\subsection{\sicl{} object representation}
\label{sec-our-technique-object-representation}

A \sicl{} object is represented in one of three different ways:

\begin{itemize}
\item As an \emph{immediate} object where the object is stored in the
  pointer itself, with the appropriate tag bits.  Fixnums, characters
  and single floats are represented this way.
\item As a two-word block.  This is how \texttt{cons} cells are
  represented.
\item As a two-word block called a \emph{header} where the first word
  points to a class object, and the second word points to a sequence
  of words, called the \emph{rack}, that contains the slots of the
  object.  All objects other than immediates and \texttt{cons} cells
  are represented this way.  We call this representation a
  \emph{general instance}.
\end{itemize}

The first word of the rack contains a \emph{stamp} which is a unique
integer taken from the class when the instance was created.  The
stamps of the arguments to a generic function are used by the generic
dispatch technique to determine which effective method to execute.
The object representation and generic dispatch technique has been
described in detail previously
\cite{Strandh:2014:FGD:2635648.2635654}, but this short summary is
sufficient to understand our bootstrapping technique.

In the description of our technique, we use the word \emph{class} in a
general way, as an object that can be used as a model for the creation
of \emph{instances}.  Thus the word \emph{class} does not imply that
it is a class in the sense of the host \commonlisp{} implementation.
While this usage of the word \emph{class} may seem odd, recall that a
class is just an ordinary \commonlisp{} object that is passed as an
argument to \texttt{make-instance} and other functions called by it
which then returns a different object.  We exploit this idea by
supplying our own definition of \texttt{make-instance} in different
phases of the bootstrapping procedure.

Similarly, we use the word \emph{generic function} in a general way,
as an object that can be executed and that can have methods associated
with it, providing partial implementations of the generic function.
Again, while this usage of the word \emph{generic function} may seem
odd, recall that a generic function is simply an ordinary
\commonlisp{} object of type \texttt{funcallable-standard-object} for
which the ultimate definition (called the \emph{discriminating
  function}) is computed by combining partial definitions (the
\emph{methods}) associated with it.  We exploit this fact by providing
different representations of generic functions in different phases of
the bootstrapping procedure, and by supplying different versions of
\texttt{compute\--discriminating\--function}
adapted to each phase.
Thus,
a \emph{generic function} is not a generic function in the sense of
the host \commonlisp{} implementation.  However, during the
bootstrapping procedure, these objects are executable in the host
system, because they are instances of the host class
\texttt{funcallable-standard-object}.

\subsection{Environments for bootstrapping}

Our technique uses several first-class global environments
\cite{Strandh:2015:ELS:Environments} to create a graph of objects that
is isomorphic to the graph of objects to be written to the executable
file instantiating the target \commonlisp{} implementation.  By using
first-class global environments, we avoid the problems related to
packages and environments cited in \refSec{sec-cl-in-cl}.  The main
feature of our technique, though, is that we create the generic
functions and classes of the metaobject protocol \emph{first}.

The environments are filled with definitions mainly as a result of
loading files containing production \sicl{} code, though some code
specific to bootstrapping is required as discussed at the end of this
section.  This loading procedure uses the Eclector%
\footnote{https://github.com/robert-strandh/Eclector}
reader and the Cleavir compiler to produce intermediate code in the
form of a fairly conventional \emph{flow graph} of instructions.  The
Cleavir compiler takes a first-class global environment as an
argument, and uses this environment to search for definitions of
macros, classes, types, etc.  The resulting intermediate code is then
translated in two different ways:

\begin{enumerate}
\item Native target code is generated from it, and attached to host
  objects representing executable target objects such as ordinary
  functions, generic functions, and methods.%
  \footnote{We do not yet have a code generator for native executable
    code, so currently this part of the bootstrapping procedure is
    omitted.}
\item It is translated to a simple subset of \commonlisp{} code that
  accesses that same environment for definitions of functions and
  other objects.  This \commonlisp{} code is then compiled using the
  host compiler in order to make it executable in the host.
\end{enumerate}

The remainder of this section is concerned with how the
host-executable code is used in order to determine the graph of target
objects represented as an isomorphic graph of host objects.

\subsection{Definitions}

In preparation for the bootstrapping procedure, several first-class
global environments are created and filled with definitions of \sicl{}
macros.  The definitions of those macros reside in production \sicl{}
files.  Little or no special code is required for those definitions.

A number of host object types are used during bootstrapping, in
particular symbols, packages, \texttt{cons} cells, and integers.
However, when such an object is used as an argument to a \sicl{}
generic function, a special version of \texttt{class-of} assigns a
\sicl{} class object as its type.  Some of the host functions
operating on these kinds of objects are \emph{imported} to our
environments in preparation for the bootstrapping procedure.

To facilitate the description of our technique, we need some
definitions:

\begin{definition}
A \emph{host class} is a class in the host system.  If it is an
instance of the host class \texttt{standard-class}, then it is
typically created by the host macro \texttt{defclass}.
\end{definition}

\begin{definition}
A \emph{host instance} is an instance of a host class.  If it is an
instance of the host class \texttt{standard-object}, then it is
typically created by a call to the host function
\texttt{make-instance} using a host class or the name of a host class.
\end{definition}

\begin{definition}
A \emph{host generic function} is a generic function created by the
host macro \texttt{defgeneric}, so it is a host instance of the host
class \texttt{generic-function}.  Arguments to the discriminating
function of such a generic function are host instances.  The host
function \texttt{class-of} is called on some required arguments in
order to determine what methods to call.
\end{definition}

\begin{definition}
A \emph{host method} is a method created by the host macro
\texttt{defmethod}, so it is a host instance of the host class
\texttt{method}.  The class specializers of such a method are host
classes.
\end{definition}

\begin{definition}
A \emph{simple host instance} is a host instance that is neither a
host class nor a host generic function.
\end{definition}

\begin{definition}
An \emph{ersatz instance} is a target general instance (as defined in
\refSec{sec-our-technique-object-representation}) represented as a
host data structure, using a host standard object to represent the
\emph{header} and a host simple vector to represent the \emph{rack}.
In fact, in order for the ersatz instance to be callable as a function
in the host system, the header is an instance of the host class
\texttt{funcallable-standard-object}.
\end{definition}

\begin{definition}
An ersatz instance is said to be \emph{pure} if the class slot of the
header is also an ersatz instance.  An ersatz instance is said to be
\emph{impure} if it is not pure.  See below for more information on
impure ersatz instances.
\end{definition}

\begin{definition}
An \emph{ersatz class} is an ersatz instance that can be instantiated
to obtain another ersatz instance.
\end{definition}

\begin{definition}
An \emph{ersatz generic function} is an ersatz instance that is also a
generic function.  It is possible for an ersatz generic function to be
executed in the host system because the header object is an instance
of the host class \texttt{funcallable-standard-object}.  The methods
on an ersatz generic function are ersatz methods.
\end{definition}

\begin{definition}
An \emph{ersatz method} is an ersatz instance that is also a method.
\end{definition}

\begin{definition}
A \emph{bridge class} is a representation of a target class as a
simple host instance.  An impure ersatz instance has a bridge class in
the class slot of its header.  A bridge class can be instantiated to
obtain an impure ersatz instance.
\end{definition}

\begin{definition}
A \emph{bridge generic function} is a representation of a target
generic function as a simple host instance, though in order for it to
be executed by the host, it is an instance of the host function
\texttt{funcallable-standard-object}.

Arguments to a bridge generic function are ersatz instances.  The
bridge generic function dispatches on the
\emph{stamp}
\seesec{sec-our-technique-object-representation} of
its required arguments.

The methods on a bridge generic function are bridge methods.
\end{definition}

\begin{definition}
A \emph{bridge method} is a target method represented as a simple host
instance.  The class specializers of such a method are bridge classes.
The \emph{method function} of a bridge method is an ordinary host
function.
\end{definition}

\subsection{Bootstrapping phases}

\begin{figure}
\begin{center}
\inputfig{fig-mop-classes.pdf_t}
\end{center}
\caption{\label{fig-mop-classes}
Simplified diagram of MOP classes.}
\end{figure}

The essence of our technique consists of four phases ($1$ to $4$),
using six first-class global environments.  An initial phase $0$
imports host classes to environment $E_0$.  Only classes that are
required in phase $1$ are imported.  Classes \texttt{standard\--method},
\texttt{standard\--generic\--function}, and the class used to represent
slots
\texttt{standard\--direct\--slot\--definition} are imported with the
same.  Classes \texttt{standard\--class}, \texttt{built\--in\--class}, and
\texttt{funcallable\--standard\--class} in environment $E_0$ all
refer to one and the same host class, namely a subclass of the host
class
\texttt{funcallable\--standard\--class}.

In each phase $i > 0$,
three first-class global environments are involved, $E_{i-1}$,
$E_{i}$, and $E_{i+1}$.  Before phase $i$ starts, $E_{i-1}$ contains
classes to be instantiated during phase $i$, and $E_i$ contains
generic functions that are not involved in phase $i$, but that will be
used in phase $i+1$ to operate on the instances of the classes in
$E_{i-1}$.  Some of the generic functions in $E_i$ are accessor
functions containing methods that were automatically added as a result
of the classes in $E_{i-1}$ being defined.  Others are higher-level
functions that call those accessors to accomplish tasks such as
initialization of various metaobjects, class finalization, creation of
effective methods, and creation of discriminating functions.

A phase $i$ has two main steps:

\begin{enumerate}
\item Accessor generic functions are created in $E_{i+1}$ by loading
  \sicl{} production code containing \texttt{defgeneric} forms.  These
  generic functions are accessor functions for MOP classes and MOP
  generic functions.  These functions are created in $E_{i+1}$ rather
  than in $E_i$ so as to protect the existing functions in $E_i$ that
  are needed later.
\item Classes are created in $E_i$ by loading \sicl{} production code
  containing \texttt{defclass} forms.  As a result of the creation of
  these classes, methods are automatically added to the corresponding
  accessor generic functions in $E_{i+1}$.
\end{enumerate}

Depending on the phase, \sicl{} production code might be loaded before
the first step, between the two steps, or after the last step.

Four phases accomplish the creation of a number of objects, ending
with a complete set of ersatz objects.  The result of each phase is
illustrated by a separate figure.  In these figures, the shape of each
object illustrates its type as shown in \refFig{fig-objects}.

\begin{figure}
\begin{center}
\inputfig{fig-objects.pdf_t}
\end{center}
\caption{\label{fig-objects}
Objects in different phases.}
\end{figure}

The four phases accomplish this following results:

\begin{enumerate}
\item Host classes and host class metaclasses in $E_0$
  are used to create host generic functions in $E_2$ and host classes
  in $E_1$.  The result of this phase is illustrated in
  \refFig{fig-phase-1}.
\item Host classes in $E_1$ are used to create bridge generic
  functions in $E_3$ and bridge classes in $E_2$.  The result of this
  phase is illustrated in \refFig{fig-phase-2}.
\item Bridge classes in $E_2$ are used to create impure ersatz generic
  functions in $E_4$ and impure ersatz classes in $E_3$.  The result
  of this phase is illustrated in \refFig{fig-phase-3}.
\item Impure ersatz classes in $E_3$ are used to create pure ersatz
  generic functions in $E_5$ and pure ersatz classes in $E_4$.  The
  result of this phase is illustrated in \refFig{fig-phase-4}.
\end{enumerate}

\begin{figure}
\begin{center}
\inputfig{fig-phase-1.pdf_t}
\end{center}
\caption{\label{fig-phase-1}
Phase 1.}
\end{figure}

\begin{figure}
\begin{center}
\inputfig{fig-phase-2.pdf_t}
\end{center}
\caption{\label{fig-phase-2}
Phase 2.}
\end{figure}

\begin{figure}
\begin{center}
\inputfig{fig-phase-3.pdf_t}
\end{center}
\caption{\label{fig-phase-3}
Phase 3.}
\end{figure}

\begin{figure}
\begin{center}
\inputfig{fig-phase-4.pdf_t}
\end{center}
\caption{\label{fig-phase-4}
Phase 4.}
\end{figure}

The result of these phases is that the impure ersatz generic functions
in environment $E_4$ can operate on the pure ersatz generic function
in environment $E_5$ and on the pure ersatz classes in $E_4$.  But
they can also operate on impure ersatz objects, provided their call
caches contain entries for the corresponding stamps.  Filling the call
caches is the purpose of our \emph{satiation} technique
\cite{Strandh:2014:RMI:2635648.2635656}.

\subsection{Tying the knot}

At the end of these four phases, we have fully functional impure ersatz
generic functions in environment $E_4$, and fully functional impure
classes in environment $E_3$.  But we still do not have the cyclic graph
of metaobjects that a functioning \clos{} system requires.
Furthermore, there are still bridge generic functions
that might be called in order to operate on our impure ersatz
metaobjects.

To accomplish the conversion of this hierarchy of objects to a cyclic
graph, we need to modify the \texttt{class} slot of the headers of
each impure metaobject so that instead of referring to a bridge class,
it refers to an impure ersatz class.  This operation will transform
every impure ersatz metaobject into a pure ersatz metaobject.
However, there are a few more operations required to completely remove
all references to bridge metaobjects:

\begin{itemize}
\item Each ersatz metaobject contains a list of the effective slot
  definition metaobjects of its class as the second word of the rack.
  In an impure ersatz metaobject, those effective slot definitions are
  bridge objects.  Once the \texttt{class} field of the impure ersatz
  metaobject has been updated, this list must be updated to contain a
  reference to the list of the ersatz effective slot definitions from
  the new ersatz class.
\item Each ersatz generic function contains a slot containing the
  method class of the methods on this generic function.  In an impure
  ersatz generic function, this slot refers to a bridge class, so it
  must also be updated.
\end{itemize}

We must still find and update all impure ersatz metaobjects in the
system.  For classes and generic functions, this is trivial, as they
are all reachable from the first-class global environment they are
defined in.  For other object types such as methods, slot-definitions,
and method combinations, this is not the case.  They must be found by
a traversal of the class or generic function metaobject that they are
part of.  Such a traversal is straightforward.

Before the cyclic graph can be traversed and an isomorphic graph be
generated in a native executable file, additional definitions must be
loaded:

\begin{itemize}
\item Standard classes that are needed in order for the resulting
  native executable to be viable must be loaded.  In particular,
  definitions of classes such as \texttt{symbol}, \texttt{package},
  \texttt{cons}, \texttt{sequence}, \texttt{list}, \texttt{null},
  \texttt{number}, \texttt{rational}, \texttt{integer}, and
  \texttt{fixnum} are needed in order for it to be possible to load
  compiled code into the executing image.
\item Many standard functions are also needed, such as functions on
  packages, lists, hash tables, etc.  Functions that operate on
  first-class global environments are needed as well.
\item A simple version of the compiler must be loaded so that the
  resulting executable image can construct discriminating functions
  when definitions of generic functions and methods are loaded.
\end{itemize}

On the other hand, the garbage collector may not be needed in the
initial executable image, though the data structures that the garbage
collector works with must of course be present so that objects can be
laid out in memory.

%%  LocalWords:  accessors metaobjects accessor immediates metaobject
